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1. Introduction 

The future GOES-R Geostationary Lightning Mapper (GLM) will have many noteworthy characteristics 
including high detection efficiency, continuous monitoring of both ground and cloud flashes, and a wide (nearly 
hemispheric) field-of-view coverage. By contrast, the National Lightning Detection Network™ (NLDN) system has 
exceptional detection efficiency only for ground flashes over the continental US [Cummins et al., 2006; Murphy et 
al., 2006]. 

In this work, a fundamental question is asked: Can GLM space-based optical measurements be used to 
discriminate ground flashes from cloud flashes? Continuous knowledge of the ratio of cloud flashes to ground 
flashes (or “Z-ratio”) derived from GLM data would provide a better understanding of thunderstorm dynamics, 
intensification, and evolution, and would improve the value-content of GLM data for severe weather warning. 
Knowledge of the Z-ratio for lumped regional flashes would also benefit lightning climate studies. 

Statistical evidence suggests that the optical parameters of ground and cloud flashes are, on average, 
significantly different [Koshak 2007]. While no specific statements can be made from GLM data alone, it may be 
possible to give the probability that a given flash is a ground or cloud flash, conditioned on the evidence provided by 
GLM and additional GOES-R observations. Furthermore, it would be desirable to improve the flash type 
discrimination methodology by validating flash type results with NLDN data; this allows the methodology to 
“learn.” 

Following Koshak [2007], a Bayesian Network (BN) is suggested for use as a flash type discriminator. In a BN, 
graphical methods are employed to determine conditional independencies, and using these, a convenient form for the 
joint probability distribution is derived. Certain conditional probabilities must be known to execute BN calculations. 
These can be estimated using Optical Transient Detector (OTD) or Lightning Imaging Sensor (LIS) data. 

In this work, we conducted initial tests using conditional probabilities for five OTD flash- level parameters 
(radiance, area, duration, # of optical groups, # of optical events) obtained from Koshak [2007]. These tests 
indicated that there is too much overlap in the conditional probability distributions between ground and cloud 
flashes to effectively discriminate flash type for individual flashes based on these five parameters. Additional data 
minin g efforts are needed to intercompare additional ground and cloud flash characteristics. However, because of 
the central limit theorem, the overlap problem is substantially reduced by converting the conditional probability 
distributions into distributions of the means. This allows us to estimate the fraction of ground flashes in a large 
sample of flashes (for example, as accumulated in a single latitude/longitude bin for climate studies). 

Finally, we note several advantages of the BN analyses. For example, message passing algorithms exist that 
allow for efficient calculation of the required probabilities [Pearl 1988]. Comparisons between proposed 
distributions and between structures are facilitated by distance measures [Jensen 1996]. The probabilities of each 
piece of evidence may be obtained by marginalizing. Correct findings should be positively correlated. The joint 
probability of all of the evidence should exceed the product of probabilities of independent findings. This provides a 
way of detecting conflicting data. 

In this effort, we adopt the familiar convention that random variables are denoted by upper case letters. Their 
lower case counterparts will represent particular states of the associated random variables. Vectors and matrices will 
be denoted by bold roman letters. Probability parameters will generally be written as 0. 

2. Prior and Posterior Probabilities 

In Koshak [2007], the average parameters: radiance (J/m 2 /ster/pm), area (km 2 ), duration (sec), # groups 
(integer) and # events (integer) are compared for ground and cloud flashes. Each average ground flash parameter 



was roughly numerically twice that of its cloud flash counterpart. Standard z scores ranged from about 40 to 65 
indicating very high confidence for rejection of any hypotheses involving equality of the means. Very large sample 
sizes, ranging from tens to hundreds of thousands of lightning flashes, taken over several years were used. It was 
suggested that the probability that a flash is a ground flash could be updated given one of the observed optical 
parameters. In other words, the probability that flash type is a ground flash (T= g ) given an observed radiance R = r 
could be written as 


P{g\r) = 


P(r I g)-P(g) 

[P(r\g)P{g) + P(r\c)P(c)X 


( 1 ) 


where T - c indicates a cloud flash. Here, P(g) is called the prior probability and can be estimated using the Z-ratio 
results obtained in Boccippio [2001], P(g | r) is referred to as the posterior probability. The suggestion in (1) is 
generalized in the following section. 


3. Bayesian Networks 

In a BN, the variables are represented graphically as nodes. Arrows indicate the parentage (causes) for each 
node. From the graph, one is able to determine conditional independencies, and calculate the joint probability 
distribution. The output will be a conditional probability similar to that given in (1), but conditioning will be on a 
vector of optical parameters rather than just radiance. The required inputs consist of conditional probability tables 
supplying the probabilities of the evidence given the causes. 

Advantages of Bayesian updating are given in Hopgood [2001]: (i) The technique is based on a proven 
statistical theorem, (ii) The result is expressed as a conditional probability which has a clearly defined and familiar 
meaning, (iii) The probability of a hypothesis can be updated in response to more than one observed variable. 
Disadvantages include: (i) The prior probability of an assertion must be known or estimated, (ii) Conditional 
probabilities must be known or estimated, (ii) Certain assumptions of independence may be unfounded. The large 
amount of data available and the strong correlations it exhibits allow for good estimates of the priors. Disadvantages 
(i) and (ii) are not expected to present difficulties. As for disadvantage (iii), all models depend on assumptions. The 
conditional independencies assumed here appear most reasonable. 

Again, flash type will be represented by T. This is a binary variable which can take on states c and g for a cloud 
and ground flash, respectively. Provision will be made for variables that we may need to incorporate later; i.e., such 
as observations from the GOES-R Advanced Baseline Imager (ABI), or other GLM products. All of these will 
comprise the latecomers L. The remaining variables will be those quantities that are measured, M. The random 
vector M is given by 

M = {I,0,R,A,D,U,V), (2) 

where I and O are ice content and optical depth as measured by ABI, and R, A, D, U, V represent radiance, area, 
duration, number of groups, and number of events respectively. Each is a random variable which can take on a finite 
number of mutually exclusive states. Unlike in elementary probability, the variables are not events (subsets of the 
sample space). The probability of R — r and A = a will be written P(r, a). The output desired of the network will be 
the entries in the table 

P(7\L,M) (3) 

for all of the states of each of the variables. By marginalizing, we can determine P(T | M). 

The definition for conditional probability [Hogg and Tanis, 1997] 

P(X\Y) = P(X,Y)/P(Y), (4) 

may be used to obtain the chain rule of probability 

P(X l ,X 2 ,...,X n ) = P(X l \X 2 ,...,X„)P(X 2 ,...,X n ) 

= P(X t \X 2 ,.:.,X n )P(X 2 \X 3 ,...,X n )...P(X n ). (5) 


2 



Conditional independencies can be used to simplify the result. 

There are 3 possible connections in a network, serial, diverging, and converging. See Fig. 1 


Serial Connection 


Diverging 

Connection 


Converging 

Connection 


Fig. 1 

The shaded variables are observed. In the serial and diverging connections, X and Z are independent given Y\ 
P(X | 7,Z) = P(X | Z). For a converging connection, X and Z are independent (i.e., 
P(X,Z) = P(X)P(Z)) unless Y or one of Fs children are observed. Generalization to connections with more 
than 2 parents or children is immediate. 

Presently, we assume the following structure for the network. 



Directed Acyclic Graph 
Fig. 2 
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Explicitly, from the chain rule we have 


P(.T ,I,0,R,A,D,U,V) 

= P(T)P(I | T)P(0 1 T,I)P(R | T,I,0)P{A,\T,I,0,R)P(D \ T,I y O,R,A) (6) 

xP(U | T, 1,0, R, A, D)P(V | T,I,0,R, A, D,U). 

Conditional independencies may be read directly from the Directed Acyclic Graph (DAG). The result is 

P{T,I,0,R,A,D,U,V) 

= P(T)P([)P(0 1 1)P(R | T, 1, 0)P(A | T,1, 0)P(D \ T, 1,0) (7) 

xP(U\T,I,0)P(V\T,I,0). 

We are only interested in a particular conditional probability in order to be able to discriminate cloud and ground 
flashes. It will turn out that we will not need P(I, O); 

Pij\i 2 o 2 R 1 A 1 D 1 uy) 

P(I,0,R,A,D,U,V ) 

= P(r)P(j? I T,I,0)P(A I T,I,0)P(D \T,j,Q)P(l j \T ,I,Q)P(V | T,I,Q) ( g ) 

~ I P(T)P(R \T,I,0)P(A | T,1, 0)P(D | T, 1, 0)P{U \T,I,0)P(V\T,I,0) 

T=g,c . 

The sum consists of only 2 terms. The tables might become large. Most are 4-dimensional arrays. For each lightning 
flash, the probability is obtained with only 12 operations (multiplications and additions). 

The calculations above assume that exactly one state will be known for each of the measured variables in M. 
Evidence can consist of any number of states. In order to compress the notation, let M j denote the /th component 

of M and m k - , k = 1, 2, . . . ,t j be the Ath state of variable M j . Each variable is allowed to have a different number 

of states. Evidence on Mj will be denoted by E - {m . ) = mj' V mj 2 v . . . v . The v's are logical 
symbols for “or”. Each is an integer in {l, 2, . . . , t . }. It represents the Ath state of M j allowed by the evidence. It 
was assumed that the evidence allows q j states of M j . The set of all of the evidence vectors will be written as 

e(M). It is expected that the evidence will consist of exactly one state of M j for each j. Elowever, it is not difficult 

to allow for several states. Passing to the limit allows for direct extensions from discrete states to intervals for 
continuous variables. 

From elementary probability theory, P{x V y) = p(x) + P(y) - P(x A y) . Since the variables can only take 
on mutually exclusive states, the last term is zero and we have 

p[e j (nij )] = p[m!' v m’p v . . . v mf ‘ ) = ^ p{tnf ). (9) 

k = 1 

For the probability of T given evidence, we have 


P(T\I,0,R,A,D,U,V) 
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P(T 1 8) = P(T | m e { n v ... v n\ Xqx m e n nl v . . . v m ^ n ) 

m e ;-) 

= (10) 

T k , £„=1 

X p ( r ’ L > M ) 

_ _e L,e 

~^P(T,M) ^ P(T, L,M)' 

7\e T, L,e 

By taking maxima over the variables instead of sums, one can obtain a most probable configuration of variables 
given the evidence. This may be useful if some nuisance variables appear in L that cannot be measured or otherwise 
estimated. 

4. Result #1: Determining Flash Type of a Single Flash 

Since ABI data is not yet available, tests have been performed using (8) and the DAG in Fig. 2 with the I and O 
nodes omitted. The required conditional probabilities were obtained by using OTD data. When a prior probability of 
a lightning flash being a ground flash of 0.25 was used, the average values of R, A, D, U, V for a ground flash 
yielded a posterior probability of about 0.32. Using the average values for a cloud flash the posterior probability was 
decreased to about 0.22. While the change in the prior is significant for both cases, it is impractical to use the results 
to discriminate individual flashes since the results are not near unity (ground flash) or zero (cloud flash). This 
ambiguity is fundamentally linked to the fact that the parameter distributions for ground and cloud flashes overlap to 
a significant degree. Therefore, additional data mining of the optical characteristics of flashes will be needed (and 
possibly ABI data) to see if the BN can discriminate flash type on a flash-by-flash basis. Of course any such 
discrimination is fundamentally statistical. 

5. Result #2: Determining the Fraction of Ground Flashes in a Set of Flashes 

Rather than interrogating individual flashes, we can consider a set of flashes and then estimate what fraction a 
of these are ground flashes. The same method applies; one only need change 7' to a and replace the R, A, D , U, V, by 
their average values for the set of lightning, 

P(a I L,M) = P(a I L)P(M I a)/X a .-P(«' I L)P(M | a'). (ii) 

For initial testing, we removed / and O, and we eliminated L. If we assume that all values of a are equally likely, an 
assumption known as the ignorance prior , the result is the simple formula 

P(a | M) = P(M | «)/X„.«M | «'). (12) 

Conditional probabilities for the average parameters P(M | a) may be estimated by resampling the OTD data 
for averages. It is also possible, in the case of large sets of lightning, to approximate these probabilities using normal 
distributions by appealing to the central limit theorem. Using a theorem for statistics, the sample mean for any of the 
average variables is distributed with mean and variance given by 

.. . 2 (l-ar)cr? +aat 

ju = (l-a)ju c +otfi g , a 2 = s -S-, ( 13 ) 

n 

2 2 

where fl c , jJ g , <7 C , O g denote the means and variances for a variable associated with pure cloud flashes and 
pure ground flashes respectively, and n is the sample size. 
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Eq. (12) was applied to randomly selected sets of lightning flashes, each set having a distinct value of a. The 
retrieval error (deltaALPHA) in a; as a function of a, is given in Fig. 3. Each set contained n = 10,000 flashes, and 
the errors were generally within 8% of the actual value of a for the set. 



Error Plot for Retrieved Mixture 
Fig. 3 


Further refinements including geographical information, / and O parameters and perhaps the availability of 
additional parameters observed by GLM should further improve the results. Fig. 4 is the same experiment, but 
shows the mean retrieval errors obtained for 100 analyzed sets at a particular or, the vertical lines represent standard 
deviations about the mean. 

Given the small retrieval errors in Figs. 3 and 4, we believe that this approach would be valuable in determining 
the ratio of cloud to ground flashes in a gridded lightning climatological dataset where any one geographical bin 
defines a set of lightning flashes with large sample n. 
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Mean Error Plot for Retrieved Mixture 
Fig. 4 


6. Conclusions 

We have examined the feasibility of employing a Bayesian Network (BN) for discriminating lightning flash 
type (ground or cloud) using the statistics of certain flash optical parameters (e.g., radiance, area, duration, # optical 
groups, # optical events). It was found that the distribution of any parameter for a ground and cloud flash have 
sufficient overlap so as to make discrimination of flash type difficult on a flash-by-flash basis. This implies that 
flash type discrimination for a single flash must involve more parameters than the five we have examined here. 
However, any of our 5 parameters do have mean values that differ between ground and cloud flashes. Moreover, we 
have found that the distributions of the mean for any of these five parameters involve much less overlap between the 
ground and cloud flashes. In the BN framework, this implies that it is conceivable to discriminate on a set of n 
flashes. That is, given a set of flashes collected over a certain time period and composed of so many ground and 
cloud flashes, the BN could estimate the fraction, a, of ground flashes and the fraction (l -a) of cloud flashes. This 
paper has shown that the BN evidently can determine or for large n to within a reasonable percent (see section 5). 
Hence, the primary application of this approach would be for categorizing the ratio of cloud flashes to ground 
flashes in lightning climatology datasets. 

The BN approach is not static. We can begin with estimates of conditional probabilities necessary to start the 
calculation, but the network can learn from GLM and NLDN data. Methods are available for parameter lea rnin g 
using complete and incomplete data sets (see Appendix). In the case of incomplete data, an approximate method 
may be used if the exact method becomes computationally expensive. 
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APPENDIX 


A.1 Methods for BN Calculation 

A number of computer programs are available for performing BN calculations. A summary of software 
packages may be found in an appendix of [Korb and Nicholson 2004], A popular algorithm used by many packages 
is a message passing scheme using junction trees [Pearl 1988, Jensen 1996]. 

Large BN’s can be computationally expensive. One method to improve efficiency is to group subsets of nodes 
in such a way that the conditional independencies can be exploited. The directions in the DAG are suppressed and 
connections are made between nodes that share common children. The result is known as a moral graph. See Fig. 5 
for a moral graph corresponding to the DAG of Fig. 2. 



Moral Graph 
Fig. 5 
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Cycles of length greater than 3 must be removed by adding links. This is called trangulation. The proposed lightning 
BN’s moral graph is already triangulated. 

Nodes are selected one at a time, and for each node a clique is formed by considering all of that node’s 
neighbors that are pair wise joined. The node under consideration is then removed together with its links. This 
process is repeated until every node is a subset of some clique. The cliques are connected with separators that 
comprise the intersections of the cliques. Messages containing joint probabilities are passed between the cliques 
through the separators. No message needs more than one path. Therefore, redundant separators may be deleted. The 
result is known as a junction tree. A junction tree has the property that for each pair of cliques ^and J' all paths 
between ^Tand ^contain A junction tree for the proposed lightning BN is very simple; see Fig. 6. 



Junction Tree 
Fig. 6 

Associated with each clique, is the joint probability distribution of the variables therein. For example, for the 
clique RTIO, we would have P(T)P(I)P(0 \ I)P(R \ T> /, O). Initially, an appropriately sized table of ones is 
associated with each separator. In this case, we would use 2 xnjxn 0 tables, where n f and n 0 are the number of states 
of I and O respectively. For each consecutive pair of cliques ^and J'with separator ^ a message is passed from 

^via ^by replacing the initial table /^associated with P'by a table t* y with entries . A table U 

corresponding to ^ whose entries are P (3) is replaced with jt ^ . Once messages are passed in both directions, 
the junction tree is said to be consistent. Evidence is entered by setting all entries not allowed by the evidence equal 
to zero. Renormalization and a round of message passing allow for all of the relevant information to be collected in a 
single clique. The required probability can be obtained by interrogating the table for this clique. 

A.2 Verifying Applicability 

If there are no data errors, the evidence should be such that the joint probability of all of the findings should 
exceed the product of the probabilities of independent findings. One way of measuring the amount of conflict is the 
data is to calculate 

ln[p(f, ). . . P(f, )/P(e)]. (14) 

A positive value of (14) indicates a possible conflict of data. A similar calculation can be preformed for each clique 
or group of cliques in order to trace flawed evidence. 

Let P and P* be 2 joint distributions of the same variables. A distance measure between distributions is defined 
by 

D(P,P*)=X[P(^)-P*(^)] 2 , (15) 

.Te// 
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where the summation extends over all configurations JTin the universe of variables A number of other measures 
of distance between distributions are also in use. See [3] . 

A.3 Learning 

In order to calculate (8), a number of conditional probabilities must be known. These may be estimated or 
assumed. It is desirable to be able to update these conditional probabilities using experience gained from test cases 
or actual cases for which the output can be verified. Such updating is known as parameter learning. Using data to 
update structure is also available in the literature [Heckerman 1996], but it will not be discussed here. 

The notation will be compressed by letting X = {T, L, M) be a vector of all of the variables in the DAG; X x =T, 
X 2 = L h ... . If 7,ZcI, then 

/> ( Y l z >=I sy ,,AX)/£ v/ ./>(X). ( i6) 

Everything that is needed can be obtained once the joint probability distribution is known. From the chain rule (5), 

i>(X)=np(jr,.| P a,.), 07) 

where pa, is the parent set of X h We will adopt the notation and methods in [Heckerman 19968]. Conditioning on 
the probability parameters and the network structure S gives 

AX 1 0, 5) = n P(X t | pa, , 0, , 5). (18) 

Here, Assume that each X t can take on r,- states each denoted by xf,k = l Each 

variable in pa, can also take on a number of states. Denoting each state of each variable with a superscript makes the 

notation too cumbersome. Instead, se index each configuration within pa, using pa/ , j = 1, . . . , q t . Each factor in 

(15) represents a probability table for A] being in state k and its parent set pa,- being in configuration j. The individual 
probabilities are written as 

P(*f|pa/,e,,s)=^; 

0 n e,=(e„,..,ej. 

' k-2 

These are the entries in the probability tables that will required. Their values must be given initially, and they will 
be updated based on the data. 

Let D = {x b ... ,x N } be a random sample containing N observations of all n variables. If there are N iJX 
occurrences of x] given that its parent set is in configuration j , N ij2 occurrences of xf given that its parent set is in 
configuration^ etc. in D, then assuming a multinomial distribution, 

p(D|e»,s)-^^* , .,.V"- (20) 

The maximum likelihood configuration of the parameters 0 ijk is given by 

9 m= N ,jkjN,j, Nj=^N m . (21) 

For convenience, a Dirichlet distribution is assumed for the parameters 
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/>(e,|s)=Dir(e # | a, y ); a v =(a lfl ,...,a IJr ), 


Dir(0 


,j\ «#;=-: 




nrkj 


\(\—0 — -/9 K 1_1 0 a i/2 _1 ff a m~ X 

V U ij2 *•* y’2 


This is a prior distribution for the parameter set. The Dirichlet hyperparameters (Xt jk for any complete structure are 
constrained by 

a ijk = aP(x- ,pa' | s), (23) 

where a is an equivalent sample size. The posterior probability depends on the sample D, 

V ,J ' ’ ’ P{D\S) 


r 

( n 

L a m 

l*- 1 

( 

p(d 1 5)nr(, 



(i-o IJ2 -...-0 iin y^0 IJ 


a ij 2 +N ij 2~1 Q a ijrj +jV i it) -1 


= Dir(e s |a, } +N,). 

The quantity in brackets was recognized as fT [a ijk + N yk ) j YlT{a ljh +N iJt ) since this is a probability 


density function, and it must integrate to one. 

The probability of obtaining a given observation x^+i (after the N observations in D) is given by 

P( Xiv+1 | A5) = I/>(x w+ ,,0| D,S)dQ 

=jp{x N+i \e,D,s)p{d\D,s)de 


=jn[p(xf' |paf,e ft ,As)]nn[p(e„ |D,5)]de w. 

= nfp(xf- | P af,0,j ,D,s)p{(j jj \D,s)dQ lj n\\P(Q u I £>,5)]^} 

We are taking each variable X { in x^+j as appearing in a given state k t with its parent set in configuration j); i.e., each 
state k and configuration^ depends on /. Substituting (19) and using the fact that the integrals in the last product are 
all ones leaves 


p(x w+l \D,s)=n\e i]k p(e„ | d, s)m V] . 

Substituting (19) and performing the integration results in 


p(x N jD,s)=h aii ’ k ‘ +N J' k ‘ ■ 


n — ^ M-* a =\ a 

i=i a +N 9 lJi ^ a ' jik ' 

U ij, k=l 
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The experience in D has contributed in the expected manner in predicting the outcome of the next sample. 

A.4 Incomplete Data 

The development in the previous section assumed complete data sets. Some corruption of data should be 
anticipated. If Y, Z cr X are the observed and unobserved variables in a given case (X denotes the compete set of 
variables). The posterior distribution of Qy may be obtained using [Spiegelhalter and Lauritzen 1990] 

p{% |y,s)=i>(x‘,pa/ |y,sMe, kf,pa/^)+[i-p(pa/| y) 5)]p(e,|4 ( 28 ) 

k=\ 

Unfortunately, each of the probabilities involving 9# are Dirichlet distributions for every case. As the number of 
cases grows, the number of Dirichlet functions proliferate making computation impractical. This formula was 
included since it is expected that the GLM being new should rarely yield bad or missing data. 

If using (28) becomes impractical, a Gaussian approximation may be used. It consists of expanding 
ln[P(0 | /),£)] hi a Taylor series about the maximum likelihood estimate of 0. The first derivative term will be 
zero, and the second will involve quadratic terms. Exponentiating results in a multivariate Gaussian distribution. 
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